Planar diagrams from optimization 
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We propose a new toy model of a heteropolymer chain capable of forming planar secondary 
structures typical for RNA molecules. In this model the sequential intervals between neighboring 
monomers along a chain are considered as quenched random variables. Using the optimization 
procedure for a special class of concave-type potentials, borrowed from optimal transport analysis, 
we derive the local difference equation for the ground state free energy of the chain with the planar 
(RNA-like) architecture of paired links. We consider various distribution functions of intervals 
between neighboring monomers (truncated Gaussian and scale-free) and demonstrate the existence 
of a topological crossover from sequential to essentially embedded (nested) configurations of paired 
links. 



I. INTRODUCTION 



Both DNAs and RNAs are heteropolymers consti- 
tuting of four different nucleotide types. The pecu- 
liarity of RNA chains consists in the additional free- 
dom of the formation of complex secondary structures. 
These secondary (intra-molecular) structures are stabi- 
lized by theromoreversible hydrogen bonds between non- 
neighboring nucleotides and mostly take a "cactus-like" 
hierarchically folded form, topologically isomorphic to a 
tree. The structures which do not belong to this tree-like 
class are known in the literature as "pseudoknots" , and 
in most cases are highly suppressed. The main task of 
any computational algorithm predicting the secondary 
structure of RNA can be formulated as a search for a 
secondary structure with the lowest value of the free 
energy ("ground state") among all allowed cactus-like 
structures. 

Construction of an effective dynamic programming al- 
gorithm (DPA) to predict RNA-like secondary structures 
is a much more challenging problem than that for a clas- 
sical DNA-matching problem (see pHll). In the simplest 
possible case the generic DPA allowing to calculate the 
cost function and to find the ground state structure of 
an RNA-type polymer with a given primary sequence, 
is as follows. Suppose that a given chain consists of n 
monomer units, each unit chosen from a set of c different 
types (letters) A, B, C, D, . . . . These units can form 
noncovalent bonds with each other, at most one bond 
per unit. The energy of a bond depends on which let- 
ters are bonded, the simplest choice is to assign some 
attraction energy u to the bonds between similar letters 
(A-A, B-B, C-C, . . . ) and zero energy the bonds be- 
tween different letters (A-B, A-D, B-D, ...). In real 
RNAs matches are the interactions between complimen- 
tary nucleotides rather than similar ones, which gives rise 
to a slightly different matrix of interactions. However, at 



least for random RNAs this difference is irrelevant: it 
is important that the fraction of possible matches is -, 
the rest corresponding to mismatches. Schematically the 
secondary structure of RNA chain is shown in Fig. [T£i. 
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Figure 1: (a) Schematic cactus-like secondary structures of 
an RNA-like chain; (b) the height diagram for (a) represented 
by a Motzkin path. 

The simple model, serving as a "shooting range" for 
theoretical consideration of secondary structures forma- 
tion typical for the ensemble of messenger RNAs, is as 
follows. Let us neglect the contribution of loop factors 
to the partition function and variation in the energies of 
different types of complementary nucleotides, avoid the 
constraints on the minimal size of loops in the structure, 
and disregard the stacking interactions (the cooperativity 
in bonds creation between adjacent pairs of monomers). 
What is preserved only, is the possibility of a formation 
of a cactus-like folded configurations for any arbitrary 
sequence of nucleotides. The partition function of this 
model is known (see, for example, pMlOj) to satisfy the 
recursion relation: 



i+k 



9i,i+k — 9i+l,i+k + 2_^ Pi,s9i+l,s-l 9a+l,i+k] 



s=i+l 



(1) 



9i,i = 9i+i,i = 1. 
The term gij describes the contribution to the partition 



function of the part of the sequence between monomers 
i and j. The Boltzmann weights 



ft. 



■"■-' /T , 1 < i < j < n 



(2) 



are the statistical weights of bonds, and the "bound- 
ary conditions" g^ = g%+\ t i = 1 take care of the un- 
paired bonds. Expression ([T]) is convenient for recursive 
computation. The energy of the ground state, F\ n = 
lim — Tlngi^, is the free energy of the system at zero 

temperature, so it can be calculated as follows: 



F 



i,i-\-k 



lim -T]ng ii+k = min I F i+ i i+k , 

'— 5-+0 I 

[u i>s + Fj+i ]S _i + F 8+ i )i+k ] j. (3) 
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been supposed that Ui.j is a quenched uncorrelated ran- 
dom function of i and j, having a Gaussian distribution. 
Within such a model it has been demonstrated that the 
presence of a frozen heteropolymer structure of a chain 
plays a crucial role: due to the frustrations in the primary 
sequence, the system exhibits a glass transition |£| Q. 
The quenched randomness in the primary sequence af- 
fects also the height diagram. It was found numerically 
that in glassy state of random RNA the roughness expo- 
nent 7 takes the value close to 7 = 2/3. Recent analytic 
estimates by field-theoretic arguments and RG analysis 
[lj] give 7 ~ 5/8. Despite the essential progress in the 
field, to our point of view, the question about the value 
of roughness exponent for random heteropolymer RNAs 
is still open. 



The geometry of the secondary structure becomes very 
transparent if one represents binding of monomers by so- 
called "height diagram" Q depicted in the Fig. [Tb. That 
is, construct an auxiliary one-dimensional walk accord- 
ing to a following rule. Start from x = and at each dis- 
crete time tick allow a step of ±1, or 0. If the monomer 
i in the original cactus-like structure is connected to a 
monomer j and i < j, then i-th step of the walk is "up." 
If i is connected to such a j that i > j, then the corre- 
sponding step is "down." If i is not connected with any 
other monomers, then the walker at i-th step stays put. 
Clearly, thus defined trajectory returns to zero after n 
steps and remains non-negatve for all < i < n, i.e. stays 
in the domain (x > 0, i > 1) on (x,i)-plane. Such tra- 
jectories, being discrete Brownian excursions, are called 
Motzkin paths [11}. It is clear from the comparison of 
Fig. [T]i and b that there exists a one-to-one correspon- 
dence between cactus-like RNA secondary structures and 
height diagrams represented by Mozkin paths. Namely, 
the height of the point in the height diagram equals to the 
number of arcs going above the corresponding point on 
the arc diagram, i.e. coincides with the number of bonds 
one has to break to reach the corresponding monomer 
from the starting point of the chain. An important sta- 
tistical characteristic of the state of the system is the 
so-called "roughness exponent," 7, which links the mean 
height, (h), of such a diagram with the length, L, of the 
chain: (h) ~ L 7 , < 7 < 1. 

For homopolymcr RNAs, the interaction energies Ujj 
take one and the same value u independent from i and j. 
It is well known that for the uniform model Eq. ([T]) can 
be easily solved exactly by generating functions method 
[111 ] . This model displays the existence of a 2nd order 
phase transition from unpaired to strongly paired regime 
at u = u CT . The roughness exponent, 7, for a height 
diagram is typical for randomly branched homopolymcr, 
7 = 1/2. 

The investigation of thermodynamic properties of ran- 
dom RNA-like chains is addressed in a number of recent 
theoretical papers [8|, 0, [l2T[l4| . In these works it has 



II. THE RANDOM INTERVAL MODEL: 
SUBADDITIVITY AND SUBMODULARITY 



Let us begin with some general definitions relating 
topology of planar diagrams and optimization. Following 
R. McCann [191 ] , we call the function w a cost function 
of concave type if for any x\, x 2 , j/i, y 2 £ K the inequality 

w(xi,yi) + w(x 2 ,y 2 ) < w(x 1 ,y 2 )+w(x 2 ,y 1 ) (4) 

implies that the intervals connecting xi to y\ and x 2 to 
y 2 are either disjoint or one of them is contained in the 
other. Examples are: w(x,y) = \x — y\ a with < a < 1, 
or w(x,y) = In | x — y\ extended to the diagonal x — y by 
—00. In fact, whenever a cost function w of concave type 
is spatially homogeneous and symmetric, i.e., w{x,y) = 
g(\x — y\), the function g must be strictly increasing and 
strictly concave [19(. Let now x\ < x 2 < ■ ■ ■ < x 2n be an 
even number of points on the real line R. Consider the 
complete graph K 2n on these points, each of whose edges 
(xi, Xj) is equipped with a weight w(xi, Xj). We look for 
a minimum- weight perfect matching in the graph K 2n , 
i.e. , for a set of n edges such that the sum of their weights 
is minimal. 

A bipartite version of the graph matching problem has 
been thoroughly treated for costs of concave type in the 
continuous setting in [191 ]. Similar discrete versions have 
also been considered in the literature on optimal algo- 
rithms construction for the specific case of the distance 
\x—y\ [15l. [l8l.[20l and for a general cost function of a con- 
cave type in [161 ]. Call a matching planar if, for any two 
arcs (xi, ?/j) and (xj, yj) that are present in the matching, 
the corresponding intervals in M. are either disjoint or one 
of them is contained in the other. In [151 |l9( it is proved 
that a minimum- weight matching is planar. 

In Fig. [2] we rephrase this theorem pictorially. Taking 
weights w(xi,yi) = ln|xj — j/i|, we can straightforwardly 
check that the minimal value of the total cost function 



Ct(xi,yi;...;x n ,y n ), where 

n(x 1 ,y 1 ;...;x n ,y n )= ^ \n\xi - y t \, 

{arcs} 

is achieved at some planar configuration of pairings. 
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Figure 2: Optimization with the concave-type cost function 
leads to the planar pairing. 

Now we are in position to formulate our toy Random 
Interval Model (RIM) of a quenched heteropolymer RNA, 
in which the paired monomers interact with the energy 
Eij , which is a concave function of the distance between 
monomers along the chain. In particular, we choose Eij 
of the form 



Eij = u\n\xi 



"Jll 



(J * i) 



(5) 



where u is some positive constant, and Xi,Xj are the co- 
ordinates of monomers i and j along the chain. The 
distances di = \xt+i — Xi\ along the chain between se- 
quential monomers capable to form pairs are quenched 
random variables taken independently from some distri- 
bution P(di = d). Schematically, a typical realization of 
a RIM is depicted in Fig. [3J by arcs (a) and by a height 
diagram (b). 




Figure 3: Typical configuration of a random interval RNA, 
shown (a) by arcs, and (b) by a height diagram. 

Let us emphasize that the key feature of the RIM 
consists in the fact that the interaction energy between 



with the "boundary conditions" i^+i,i = for any i. 
Note that it is enough to extend the min in s over values 
with odd increments with respect to i: no arc can cover 
an odd number of points, because otherwise some of them 
would be excluded from the structure due to planarity. 

1. It is easy to see that the recursion © enumerates 
all planar arc structures on points Xi, Xi+i, ■ ■ ■ , Xi+k- In 
particular it implies that 



for all i and all odd fc > 1 and that 

Fi,i+k < Fi,i+£ + -Fi+^+M+fc 



(7) 



(8) 



for all i and 1 < £ < fc with fc, £ odd. The latter property 
can be described as subadditivity of the functional F: for 
two non-overlapping configurations of points x\ < xi < 
■ ■■ < x i+e and Xi+e+i < x i+e+2 < ■■ ■ < x l+k , the value 
Fi t i+k for the united configuration is not greater than the 
sum of the values Fi t i + i and Fi + i + i^ + k on the two partial 
configurations. 



Eij of concave 



2. For the cost function w(xi,Xj) 
type, the free energy functional is not only subadditive, 
but enjoys a stronger property: for all i, odd 1 < £ < k 
and even j with j < £ + 1 , F verifies the inequality 



Fi t i+k + Fi + j t i + i < Fi,i + £ + Fi + j t i + k 



(9) 



of which (J5J) is a particular case corresponding to j = 
£ + 1. This property of F is called submodularity: note 
that it is similar to (0]) when x% < x% < y% < y\. It 
suffices to establish submodularity for j = 2 and £ = fc— 2: 



F 



1,-i+fc 



< Fu 



+/c-2 



F, 



i+2,i+k 



F, 



+2,i+k-2i 



(10) 



the general case ([9]) is recovered by induction. Indeed, it 
was established in [T7| that F satisfies a recursion 

Fi^+k = minfe^i+fc + F i+ i^-i] 

Fi^i+k-2 + Fi + 2,i+k — Fi + 2,i+k-2\ (11) 



that combines ([7]) and (fit)]) . In other words, F is the 
maximal submodular functional that satisfies also ((7]). 

Thus, the function -Fi.i+fc for concave-type potentials 
satisfies not only the standard nonlocal Eq.©, but also 
a local Eci. tfTTT) . For completeness a derivation of ([TTj) 



paired monomers, e itj , is a concave function of distance. taken from [171 is included in Appendix 
In principle, one could take Eij in the form e 



1 , where < ct\ < 1, or s, 



<:) 



»|J 



where a-i > (j ^= i). The main conclusions will survive, 
though the details are model-dependent. 

Supposing that every monomer in the ground state 
structure is involved in binding, after some simplifica- 
tions we get from ([3]): 



"*2,2+A: 
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s— i+1 ,«+3, . . . ,i+k 



F 



i+l,g— 1 



F. 



s+l,i+k 



(G) 



III. THE TOPOLOGICAL PROPERTIES OF 
THE RANDOM INTERVAL MODEL 



The random interval model defined above has some 
interesting topological features. Namely, the height dia- 
gram, h, which can be regarded as a quantitative charac- 
teristics of the "nesting degree" of planar arcs, displays 



for the Gaussian distribution of intervals a topological 
crossover from sequential pairing of monomers to essen- 
tially embedded (i.e. nested) one. Another interesting 
behavior of h is observed for the power-law (i.e. scale- 
free) distribution of intervals, where the dependence of 
the height on the the exponent in the distribution has a 
well-defined maximum. 



Numerical results 



The truncated Gaussian distribution 



Consider a random chain, in which the distances be- 
tween nearest-neighboring monomers, di = \x%+i — Xi\, 
arc distributed with the truncated Gaussian distribution: 



f(d,a) = 



C _i±zj4 
e z<*' 2 



2-ko- 



crf 



d„ 



<d<d n 



otherwise 



erf 



f-i—d n 



(12) 



is the 



where C = , ^ y-^^- } ™ v ^^- 
constant determined by the normalization condition 
/ d max f(x, a) dx = 1. To avoid any possible misunder- 
standings, require all energies in ([5]) to be positive. With- 
out te loss of generality we can chose the following values 
of the parameters of the distribution function in (fT2"j) : 
\x = 2; d m [ n = 1; <i max = 3. The distribution function 
(|T2"j) is depicted in the Fig. 2] for different dispersions a. 




Figure 4: Truncated Gaussian distribution f(a) of distances 
between nearest-neighboring monomers, a = 0.1; 0.5; 2.0. 



For a < er cr , i.e. for essentially peaked distributions, 
the ground state of a random RNA chain has a height 
equal to 1 . This means that only sequential pairs of near- 
est neighboring monomers do form bonds. The value tr cr , 
at which the height diagram exceeds 1, we call the topo- 
logical crossover point. The value a CT is computed for fi- 
nite chains and depends on its total length, TV; when TV is 
increasing, the point of transition shifts towards smaller 
values and, apparently, reaches zero when TV tends to in- 
finity. The figure [5] presents our numerical results for ran- 
dom interval chain with TV = 250, 500, 1000 monomers. 

Above the crossover point, i.e. for a > cr cr the height 
diagram monotonically increases with a and reaches some 
averaged stationary value for the RIM with uniform dis- 
tribution of intervals (a — > oo). We prefer to use the 
term "crossover" instead of "transition" since we expect 
that it is not a true phase transition, the width of which 
shrinks to zero in the thermodynamic limit. 
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Figure 5: Dependence of the average height, (h) on the control 
parameter a for the Gaussian truncated distribution. 



2. The power-law distribution 



The truncated Gaussian distribution considered above 
is good for testing the key features of the RIM of RNA- 
like chains, however itself this distribution is rather arti- 
ficial. It is much more natural to consider the scale-free 
(power-law) distributions of distances between neighbor- 
ing monomers. In this case the intervals di have the fol- 
lowing probability density function: 



Our numerical analysis shows the existence of a 
crossover for random interval RNAs in topology of 
monomer pairings (planar diagrams) from sequential to 
essentially nested one. The parameter which controls this 
behavior is the dispersion a of the distribution f{d,a). 



/(d,7) 



C 



i + dn 



(13) 



We consider all values 7 > and truncate the distri- 



bution (|13[) outside the interval d n 



< d < dr, 



The 



normalization constant C = C 7 (d max ,<i m i n ) is 



C(d 



maxi ("lain) — L TV rnax 7 -^7 v^minJJ 



A~,(x) 



iFi (1,7 ,1 



7 1 , —x 7 ) x 



(14) 



where 2-F\(---) is the hypergeometric function. In what 
follows we take the following numerical values: d m in = 
1; rfmax = 20. In contrast to the truncated Gaussian 
distribution, in the truncated scale-free distribution the 
probability of very long distances between neighboring 
monomers is not exponentially small. 




Figure 6: Power-law distribution function f(d, 7) of distances 
between nearest-neighboring monomers, 7 = 0.1; 1.0; 2.0. 

The presence of "heavy tails" in the distribution (|13p 
affects the topology of the ground state of the RNA RIM 
in a nontrivial way. Indeed, when 7 in (|13[) is increas- 
ing from zero, the "nesting degree" , h, behaves non- 
monotonically: at small 7 > it increases up to some 
maximal value (at 7 = 1) and then decreases, tending to 
1 (for 7 — >• 00) - see the Fig. [7) 

It is worth to note that the presence of "heavy tails" 
in the distribution releases the creation of nested config- 
urations in an optimal pairing. For large values of 7 the 
height diagram decreases which, as in the case of Gaus- 
sian distribution, corresponds to weakly random (practi- 
cally equidistant) RNAs with sequential optimal pairing. 
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Figure 7: Dependence of the height, (h) on the control pa- 
rameter 7 for the truncated power-law distribution. 



could be favorable if below this arc all pairs of neighbor- 
ing monomers have formed bonds. Creation of a covering 
arc involves a global reorganization of linked pairs in a 
RIM. To the contrary, the nesting discussed above, is the 
local property of the RIM due to the special arrangement 
of sequential triples. 

Let us focus on the nesting in an optimal configura- 
tion dealing with local properties of a RIM. According to 
(p?|)~ (fTtJ)) the nested configuration of two arcs is favorable 
with respect to the sequential pairing, if the following in- 
equality for the values Wj-i^+a, Ui-i,i, 0Ji,i+i, Wi+i,i+2 
holds: 



^i-M+2 + Wt.-i+l < Ui-l,i + Ui+l,i+2 



(15) 



Taking into account that Wjj = u In \xi — Xj | , we can eas- 
ily transform (|15|) into the condition on three sequential 
intervals dj_i, di, di+\: 



rfi-i > di 

di(di-i +di) 



di+i > 



(16) 



di-i — d{ 



or in a more symmetric form 



Analytic estimates 



di < 



+ di+i 




(di-i + d i+1 ) 



- 1 



(17) 



The nesting in an optimal configuration of RIM is af- 
fected two complimentary factors. On one hand, the 
nesting becomes favorable under some condition (explic- 
itly written below) on lengths of three sequential inter- 
vals di-i, di, di+\. On the other hand, the creation of 
a covering arc between two distant monomers i and j 



It can be easily checked that (fl7|) implies the first inequal- 
ity (fTtJj) . Having the distribution f(d) (Gaussian, defined 
by (fT2|) . or power-law, defined by (fTB")) ) truncated outside 
of the interval [d m in,dmax], we can compute the proba- 
bility P that inequalities f|16[) hold. Since the intervals 
dj_i, di, di + \ arc distributed independently, the desired 



probability P is determines by the integral 



P = 



f{x) dx 



f(y) dy 


- 1 ) 

f(z)dz, 




(V 1+ (-+») 2 


(18) 



where integration over x corresponds to dj_i, 
to di + i, and over z, to d,. 



over y, 



Equation (JT8J) describes appearance of 1st level nesting 
(h = 2). Moreover, it is present as a multiplier in the 
probability of the 2nd level nesting (h = 3). So, we 
can expect that numerical curves for h(o) or h(j) have 
the same features as the function (TT5)) for distributions 
f(d,a) (Gaussian) and f(d, 7) (power-law) respectively. 



Gaussian truncated distribution 



Substituting the truncated Gaussian distribution 
f(d, a) (see Eq. (fT2"l) ') with the parameters \x = 2; d m i n = 
1; tax = 3 for /(d) in Eq. (|18[) . we get the function P 
plotted in the Fig. \8\ Note that P(cr) repeats the profile 
of (h(cr)} displayed in the Fig. [5] for the average height of 
the arc diagram. However our analytic approach does not 
take into account the slight dependence of the transition 
point on the polymer length since this effect has "global" 
property and is beyond the precision of our method. It 
should be also emphasized that the appearance of the 
2nd-level nesting (i.e. of the diagrams with the heights 
h > 2) deals exclusively with global reorganization of 
pairing in the RIM. Indeed, in order to have the 2nd level 
nesting, the condition (|16[) should be valid for the inter- 
vals di-2, d^ 1 ' , di+2, where we substitute for the mid- 
dle interval dS 1 ' the combination of neighboring triples, 
di—i + di + dj+1, which itself is nested. The minimal 
value for the middle interval d^\ as it follows from (fT5)l . 
is g^ 1 ) = 2(^/2 + l)d m i n + d m i n . For the parameters of our 
distribution, we can conclude, that d^- 1 ' > <i max , what 
contradicts with the definition of the model. It means 
that all the configurations with the h > 2 have at least 
one long "global" arc. 




Figure 8: Dependence of the probability P (see (jTSJ) on the 
control parameter a for the truncated Gaussian distribution. 



is forbidden, because d (2) = 2(v^+ l)d (1) + d (1) > <2 max . 
So, in the configurations with h > 3 the nesting is again 
due to "global" factors. 
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Figure 9: Dependence of the probability P (see (|18|l ) on the 
control parameter 7 for the truncated power-law distribution. 



2. Power-law truncated distribution 



The same analysis can be performed for the RIM with 
the power-law distribution f(d, 7) (sec Eq. (TT3^) ). The 
presence of nested structures in an optimal pairing is de- 
termined by the function P (|18p . which now depends on 
the parameter 7 in the distribution (TT3"|) . We see that the 
function P(~/) has the maximum at the point 7=1. At 
7>1 the probability P tends to zero. Contrary to the 
truncated Gaussian distribution, the 2nd level nesting is 
allowed since d^ 1 ' < <i max , however the 3rd level nesting 



IV. CONCLUSION 



In this paper we have proposed a new model of a het- 
cropolymer chain with RNA-type topology of secondary 
structure and quenched random distribution of intervals 
between neighboring monomers. For quantitative anal- 
ysis of the Random Interval Model (RIM), we have in- 
vestigated the statistical behavior of "height diagrams" 
as a function of the control parameter in the distribution 
function of intervals. 

We have shown that for truncated Gaussian distri- 
bution f(d,a) of intervals (see Eq. (TT2"|) ), the height 



diagram exhibits a topological transition in pairing of 
monomers from sequential to essentially nested one. The 
parameter which controls this behavior is the dispersion, 
a, of the distribution f(d, a). 



In contrast to the truncated Gaussian distribution, 
for the truncated scale-free distribution f(d, 7) (see Eq. 
dT3I)) the probability of very long distances between 
neighboring monomers is not exponentially small. The 
presence of such "heavy tails" , or, in other words, of the 
"intermittent behavior" (i.e. very long tails mixed with 
very short ones) nontrivially affects the topology of the 
ground state of the RNA Random Interval Model. In- 
deed, when 7 in (|13p is increasing from zero, the "nesting 
degree", h, behaves non-monotonically: at small 7 > it 
increases up to some maximal value (at 7=1) and then 
decreases, tending to 1 (for 7 — > 00). 



The important result deserving attention, concerns the 
possibility to pass from the nonlocal recursion relation 
for the ground state free energy ^ to the local recursion 
relation (|11|) if and only if the interaction energy between 
paired monomers, e^j, is a concave function of distance. 
So, for any potential (even random) of concave form, the 
equation (fTTj) (and, hence, Eq. (JTJ|) can be essentially 
simplified resulting in shortening the computational time 
if these equations are implemented for numeric analysis 
of secondary structures of polymer chain with RNA-type 
architecture. 



The final remark concerns the possible interplay be- 
tween optimization problems and some particular results 
of the Random Matrix Theory (RMT) for RNA folding, 
addressed in [12J, [l3[ • Let us recall that our basic result 
relays on the theorem which proves that optimal pairings 
on the line with the concave transport function are non- 
intersecting (i.e. planar) - see, for example, the Fig. [5] 
Being formulated in RMT terms, this means that opti- 
mization leads to the extraction of a special subclass of 
planar diagrams in the large-TV random matrix ensemble, 
namely, the so-called rainbow diagrams - see, for exam- 
ple, [21j . To this end it would be interesting to formulate 
our Random Interval Model as a matrix model for finite 
N in order to check how the optimization algorithms al- 
low extract planar diagrams of special topology in matrix 
models. 
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Appendix A: Derivation of Eq. (|lll) 



Suppose A = {.t ? ;}i<j<2„ with x\ < x 2 < • • • < x 2n 
and A' = {x' i ,}i<i><2n> with x' x < x' 2 < • • • < x' 2n , are 
two sets such that x 2n < x'±, i.e., X' lies to the right 
of A. 

We will refer to minimum-weight perfect matchings on 
X and A', i.e., planar (nonintersecting) sets of n (resp. 
n') arcs connecting the points such that the sum of their 
weights, which are given by a cost function w(-, ■) of con- 
cave type, is minimal, as partial matchings and to the 
minimum-weight perfect matching on X U X' as joint 
matching. 

Call an arc (xi,Xj) in a nested matching exposed if 
there is no arc (a;,;/ , Xj> ) with Xi, Xj contained between xy 
and Xj> . We call all other arcs in a nested matching non- 
exposed or hidden. Intuitively, exposed arcs are those 
visible "from above" and hidden arcs are those covered 
with exposed ones. 

We first show, following [17j , that whenever an arc 
(xj, Xj) is hidden in the partial matching on A, it belongs 
to the joint optimal matching and is hidden there too. 
By contradiction, assume that some of hidden arcs in 
the partial matching on A do not belong to the joint 
matching. Then there will be at least one exposed arc 
(xi,x r ) in the partial matching on X such that some 
points Xi with xi < x 2 ; < x r are connected in the joint 
matching to points outside (xt,x r ). 

Denote all the points in the segment [xg, x r ] that are 
connected in the joint matching to points on the left of xi 
by z\ < z 2 < ■ • • < Zk', denote the opposite endpoints of 
the corresponding arcs by y\ > y 2 > ■ ■ ■ > yk, where the 
inequalities follow from the fact that the joint matching 
is nested. Likewise denote those points from [x^,x r ] that 
are connected in the joint matching to points on the right 
of x r by z[ > z 2 > ■ ■ • > z' k , and their counterparts in 
the joint matching by y[ < y' 2 < ■ ■ ■ < y' k , . Observe that 
although k or k' may be zero, the number k + k' must be 
positive and even. 

Consider now a matching on the segment [x£ , x r ] that 
consists of the following arcs: those arcs of the joint 
matching whose both ends belon g to [x^,x r ]; the arcs 
(z 1: z 2 ), ..., (z 2k -i,z 2k ), where [H] k = [k/2\; the 
arcs (z^z'-l), ..., (z' 2k ,, z^k'-i)' where k' = [k'/2\; and 
(zk,z' k ,) if both k and k' arc odd. Denote by W the 
weight of this matching. It cannot be smaller than the 
weight Wq of the restriction of the optimal partial match- 
ing on A to [x£, x r \. For the total weight W of the joint 
matching on A U X' we thus have 



W > W - W 



W^. 



(Al) 



We now show that by a suitable sequence of uncross- 
ings the right-hand side here can be further reduced to 
a matching whose weight is strictly less than W. 



The arcs (zi,yx) and (xg,x r ) are crossing, so that 
w(yi,zi)+w(xe,x r ) > w(yi,xe) +w(zi,x r ). Uncrossing 
these arcs strictly reduces the right-hand side of (|A1|I : 

W > W - W + W' {) 

-w(yi,zi) -w(xi,x r ) +w(y 1 ,xi) + w(z ll x r ). 



J 



Now the arcs (2/2, 22) and (z\,x r ) are crossing, so w(y 2 , 22) + w(^i,x r ) — w{z\, z%) > w(y2,x r ) and therefore 

W > W - W' + W' Q -w{yx,z\) -w{y 2l z 2 ) -w(x e ,x r ) + w(yi,xt) +w{z 1 ,z 2 ) + w{y 2 ,x r ). 

I 

Repeating this step k = [k/2\ times gives the inequality outside [xg, x r ] are eliminated from the matching, except 

possibly (yk, z k ) if k is odd. 

W > W — W + Wn — w(x£, x r ) - y^ w(yi,Zi) T , . 11 iV . -i j 

u \ c, 1 j ^ ya i, 1, \x \s> now clear by symmetry that a similar reduction 

~'~ step can be performed on arcs going from z[, z' 2 , . ■ ■ to 

+ y^ w(z2i-i,Z2i)+ }; w(y2i^i,y2i-2)+'w(y2 K ,Xr), the right. 



Ki<i- 



Ki<K 



Finally if k and k' are odd, we uncross the pair of arcs 
where in the rightmost sum yo is defined to be X£. Note {y k ,Xk) and {yk-i,y' k >-i and finally the pair (zk,y' k ,_i) 
that at this stage all arcs coming to points zi, z 2 , . . . from and (z' k ,,y k ,). 



The final estimate for W has the form 

W>W-W' + W^-w(xe,x r )- Y, w( yi ,Zi)- ^ w{z' il ,y' i ,) 

\<i<k l<i'<k' 

+ ^ w(z 2 i-\,z 2 i) + Y w ( z 2i'' z 2i'-i) +w(z k ,z' k ,) ■ [k, k' are odd] 

l<i<K l<i'<K, f 

+ 5Z w (V2 t -i,y2z-2) + ^Z w (y 2 i>- 2 ,y 2 i>-i) + w{y k ,y k ,) ■ [k, k! are even], (A2) 

where notation such as [fc, kl are odd] means 1 if k, k' are odd and otherwise. 

I 



The right-hand side of (|A2p contains four groups of 
terms: first, 

w - Yl w (.V^ z i)- Yl w(4>2/i')> 



Ki<k 



Ki'<k' 



corresponding to the joint matching without the arcs con- 
necting points inside [xe,x r ] to points outside this seg- 
ment; second, 

W- Y, w{z2t~l,Z 2i )- ^ W(4i',4i'-l) 
l<i<K l<i'<K' 

— w(z kl z' k ,) ■ [k, k' are odd], 

which comes with a negative sign and corresponds to the 
arcs of the joint matching with both ends inside [xi , x r ], 
and cancels them from the total; third, Wq — w(xi,x r ), 



with positive sign, which corresponds to the hidden arcs 
of the partial matching on X inside the exposed arc 
(xi,x r ), not including the latter; and finally the terms 
in the last line of (|A2p . corresponding to the arcs match- 
ing Xi, x r , and points y x , . . . ,y k , y[, . . . , y' k ,, i.e., those 
points outside [x^,x r ] that were connected in the joint 
matching to points inside this segment. 

Gathering together contributions of these four groups 
of terms, we observe that all negative terms cancel out 
and what is left corresponds to a perfect matching with a 
weight strictly smaller than W, in which all arcs hidden 
by (x£,.T r ) in the partial matching on X are restored. 
There may still be some crossings caused by terms of 
the fourth group and not involving the hidden arcs in 
[xi, x r ]] uncrossing them if necessary gives a nested per- 
fect matching whose weight is strictly less than that of 



the joint matching. This contradicts the assumption that 
the latter is the minimum- weight matching on X U X' . 
Therefore all hidden arcs in the partial matching on X 
(and, by symmetry, those in the partial matching on X') 
belong to the joint matching. 

Now let z, j be indices of opposite parity and such that 
i < j , and define Wi.j to be the weight of the minimum- 
weight perfect matching on the j — i + 1 points xi < 
%i+i < ■ ■ ■ < Xj. We can now show, following [13], that 
for all indices i, j of opposite parity with 1 < i < j < 2n, 
weights Wi.j satisfy the recursion 



Wij = min [w(xi,Xj) + W i+ ij-i; 
Wij-a + W i+ 2,j 

with "initial conditions" 



W, 



i+2,j-2\ 



(A3) 



and Xk+i < ■ ■ ■ < Xj-2 coincide with the (possibly 
empty) matchings Wi+2,£-i and Wk+i,j-2- For the same 
reason the (possibly empty) part of the matching Wij-2 
supported on x^+\ < • • • < Xk—i coincides with We+ik—i- 
Therefore 



Wi,j-2 = w(xi,x k ) +w(Xi+l,Xt) 

+ W i+ 2,e-i + W e+ i,k-i + W k +i,j-2- 



(A6) 



Taking into account (|A4[) . observe that in the case k — 
i + 1 and I = i, which was left out in (|A5[) . this expression 
still gives the correct formula W^j-2 = w(xi,Xi+i) + 

W i+ 2,j-2' 

Now assume that in the matching Wi+ij the point Xj 
is connected to xy and the point Xj-\ to xy- A similar 
argument gives 



W, 



= 0, Wi 



i + 2. 



= -w(Xi,Xi+l). 



(A4) 



For simplicity we will refer to the minimum-weight 
perfect matching on points x r < x r +i < ■ ■ ■ < x s as 
the "matching W r . s ." Consider first the matching that 
consists of the arc (xj , Xj ) and all arcs of the match- 
ing Wi+ij—i, and observe that by optimality the lat- 
ter its weight w(xi,Xj) + Wi+ij-i is minimal among all 
matchings that contain (xi,Xj). 

We now examine the meaning of the expression 
Wij-2 + Wi + 2,j — Wi + 2,j-2- Denote the point connected 
in the matching Wij-2 to Xi by Xk and the point con- 
nected to Xi+i by X£. It is easy to see that the pairs of 
indices i, k and i+l,£ both have opposite parity. Assume 
first that 



Xi+l < XI < X k < Xj-2- 



(A5) 



Observing that hidden arcs in partial matchings on the 
sets X = {xi, Xi+\\ and X' = {xj + 2, • ■ • , Xj-2} are pre- 
served, and taking into account parity of k and I, we see 
that Xk and xg (as well as their neighbors Xk+i and xg-\ if 
they are contained in [xj+2, Xj-2]) belong to exposed arcs 
of the matching Wi+2,j-2- Thus the matching Wij-% has 
the following structure: 




%i %i+l ~~ v Xl 



W< 



•'•';, 



+i,fc-i 



W, 



fe+i,j— 2 



where dashed (resp., dotted) arcs correspond to those 
exposed arcs of the matching Wi+2,j-2 that belong (resp., 
do not belong) to Wij-2- 

Since points xg-\ and Xk+i belong to exposed arcs in 
the matching Wi+2,j-2, the (possibly empty) parts of this 
matching that correspond to points Xi + 2 < • • • < xi-\ 



W i+ 2,j = W l+2 ,e>-i + We>+i : k>-i + W k >+i,j-2 

+ w{xi',Xj) + w(xk',Xj-i); (A7) 

in particular, if £' = j — 1 and k' = j, then Wi+2,j = 

W i+ 2,j-2 +w(Xj-l,Xj). 

Suppose that Xk < Xf . Taking into account that 
Xk, Xk+i, X£,/-i, and Xf all belong to exposed arcs 
in Wi+2,j—2, we can write 



W k +l,j-2 = Wk+l t i'-l + Wi/,j-2, 

W i+2 ,i'-i = W i+2 ,k + Wk+i.r-i 



(A8) 



and 



W i+2 j~2 = W i+ 2,k + Wah-M'-i + Wfj-2. (A9) 

Substituting (f^8|) into (fA^jl and (|AT|) and taking into 
account (|A9jl . we obtain 

WiJ-2 + W i+ 2,j - W i+ 2,j-2 = w{Xi,X k ) + w(Xi+l,Xt) 

+ Wi+2,e-i + Wt+i,k-i + Wk+i,i'-i 
+ w(xe,Xj) + W t +i.k'-i +w(x k ',Xj-i) + Wk'+i,j-2- 

The right-hand side of this expression corresponds to a 
matching that coincides with W^j—2 on [xjjXfc], with 
Wi + 2,j-2 on [xk+i,X£'-i], and with W i+ ij on [xi',Xj]. 
By optimality, this matching cannot be improved on any 
of these three segments and is therefore optimal among 
all matchings in which Xi and Xj belong to different ex- 
posed arcs. 

It follows that under the assumption that x k < xu the 
expression in the right-hand side of (|A3[) gives the min- 
imum weight of all matchings on x, < Xj+i < ■ ■ ■ < Xj . 
Moreover, the only possible candidates for the optimal 
matching are those constructed above: one that corre- 
sponds to w{xi,Xj) + Wi+ij-i and one given by the 
right-hand side of the latter formula. 

It remains to consider the case x k > Xf . Since Xk 7^ 
xgi for parity reasons, it follows that x k > xc] now a 
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construction similar to the above yields a matching which 
corresponds to Wij-2 + Wi+2,j ~ Wi+2j-2 and in which 
the arcs (xi,Xk) and (x&,Xj) arc crossed. Uncrossing 
them leads to a matching with strictly smaller weight, 



which contains the arc (xi,Xj) and therefore cannot be 
better than w(xi, Xj) + Wi+ij—i. This means that (|A3|) 
holds in this case too with Wij = w(xi,Xj) + Wi+ij-i. 
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